1 SERVICE CONTROL MANAGER TOOL EXECUTION 

2 Technical Field 

3 The present invention relates to system administration management, and, in particular, to service 

4 control manager modules. 

5 Background 

6 Computer systems are increasingly becoming commonplace in homes and businesses 



7 throughout the world. As the number of computer systems increases, more and more computer 

8 systems are becoming intercoimected via networks. These networks include local area networks 

9 (LANs) . LANs also frequently have an interface to other networks, such as the Internet, and this 
1 0 interface needs to be monitored and controlled by network management on the LAN. 

\ 1 One concern encountered with networks is referred to as network management. Network 

1 2 management refers to monitoring and controlling of the network devices and includes the ability for an 

1 3 individtial, typically referred to as an administrative user, to be able to access, monitor, and control the 

14 devices that are part of the network, or access, monitor, and control the devices that are part of the 

1 5 network coupled to other computer systems. Such access, monitoring, and control often include the 

16 ability to check the operating status of devices, receive error information for devices, change 

1 7 configiiration values, and perform other management fiinctions. As the size of networks increases, so 

1 8 too does the need for network management. 

1 9 The operating system of most computers provides an administration tool or a system 

20 administration manager for invoking and performing system management tasks. The hardware of a 

2 1 computer system, the various facilities included within the operating system, such as the file system 

22 facility, the print spooling facility, and the networking facility, as well as the operating system itself must 

23 all be managed. This means that computer systems require some involvement by a human user or a 

24 manager of the computer system for such operations as specifying certain configuration parameters, 

25 monitoring ongoing activity, or troubleshooting some problem that has arisen. These management or 

26 administration tasks can be performed manually in many operating systems via direct manipulation of 

27 configuration files or direct invocation of specific administration utility programs. But in large operating 
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1 systems involving distributed systems, amore efficient method for managing and monitoring tasks may 

2 be needed, especially in the context of tool execution. 

3 Summary 

4 A service control manager (SCM) tool execution mechanism may enable SCM users to 

5 execute the SCM tools across a set of defined distributed nodes (systems) by providing a secure 

6 mechanism, referred to as a distributed task facility (DTF), to integrate different operations, such as 

7 commands or scripts, and execute the operations across a set of distributed nodes. 

8 The SCM tool execution method may include receiving a request, which includes task 



9 information, from a user through a client to run a tool on one or more nodes, retrieving tool definition, 

1 0 node definition and user definition from a domain manager, and validating the task information received 

11 from the user. A runnable tool may be created based on the task information and the tool definition, 

1 2 and the SCM module may check user authorization to run the tool on all of the nodes requested, i.e., 

1 3 whether the user is assigned the roles associated with the tool on all of the nodes. The client may next 

14 pass the runnable tool to a DTF, which may then issue a task identifier based on the runnable tool, and 

1 5 pass the runnable tool to agents associated with the nodes to execute the tool. Finally, the DTF may 

1 6 collect task results or failure reports from the agents, and return the task results to the client and then 

17 to the user. 

i 8 Description of the Drawings 



1 9 The detailed description refers to the following drawings, in which like numbers refer to like 

20 elements, and in which: 

2 1 Figure 1 illusttates a computer network system with which the present kivention may be used; 

22 Figure 2 illustrates the relationships between the user, role, node, tool and authorization obj ects; 

23 Figure 3 illustrates the relationships between clients, a DTF and agents running on the nodes; 

24 and 

25 Figure 4 is a flow chart of a method for executing tools in the SCM module. 

26 Detailed Description 

27 A service confrol manager (SCM) module multiplies system adminisfration effectiveness by 

28 distributing the effects of existing tools efficiently across managed servers. The phrase "service control 
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1 manager" is intended as a label only, and different labels can be used to describe modules or other 

2 entities having the same or similar functions. 

3 In the SCM domain, the managed servers (systems) are referred to as "managed nodes" or 

4 simply as "nodes". SCMnodegroupsarecollectionsofnodesintheSCMmodule. Theymayhave 

5 overlapping memberships, such that a single node may be a member of more than one group. The 

6 grouping mechanism may allow flexible partitioning of the S CM module so that users may use it to 

7 reflect the way nodes are already grouped in their environment. 

8 Figure 1 illustrates a computer network system with which the present invention may be used. 

9 The network system includes an S CM 1 1 0 running on a Central Management Server (CMS) 1 00 and 
10 one ormore nodes 130 or node groups 132 managed by the SCM 110. The one ormore nodes 130 
U and node groups 132 make up an SCM cluster 140. See ServiceControl Manager Technical 
1 2 Reference, HP® part number: B 83 3 9-900 1 9 . available from Hewlett-Packard Company, Palo Alto, 
{3 CA., which is hereby incorporated by reference and which is also accessible at 

1 4 <http://www.software.hp.com/products/scmgr> for a more detailed description of the SCM 1 1 0. 

1 5 The CMS 1 00 can be implemented with, for example, an HP-UX 1 1 .x server running the SCM 

16 110 software. The CMS 1 00 includes a memory 1 02, a secondary storage device (not shown), a 

1 7 processor 1 08, an input device (not shown), a display device (not shown), and an output device (not 
} 8 shown). The memory 1 02 may include computer readable media, RAM or similar types of memory, 

1 9 and it may store one or more applications for execution by processor 108, including the SCM 1 1 0 

20 software. The secondary storage device may include computer readable media, a hard disk drive, 

2 1 floppy disk drive, CD-ROM drive, or other types of non-volatile data storage. The processor 1 08 

22 executes the SCM software and other application(s), which are stored in memory or secondary 

23 storage, or received from the Internet or other network 116. The input device may include any device 

24 for entering data into the CMS 1 00, such as a keyboard, key pad, cursor-control device, touch-screen 

25 (possibly with a stylus), or microphone. The display device may include any type of device for 
2 6 presenting a visual image, such as , for example, a computer monitor, flat-screen display, or display 

27 panel . The output device may include any type of device for presenting data in hard copy format, such 

28 as a printer, and other types of output devices include speakers or any device for providing data in 
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1 audio form. The CMS 1 00 can possibly include multiple input devices, output devices, and display 

2 devices. 

3 The CMS 1 00 itself may be required to be a managed node, so that multi-system aware 

4 (MSA) (described later) tools may be invoked on the CMS . All other nodes 1 3 0 may need to be 

5 explicitly added to the SCM clusterl 40. 

6 Generally, the SCM 11 0 supports managing a single SCM cluster 1 40 from a single CMS 1 00. 

7 All tasks performed on the SCM cluster 140 are initiated on the CMS 100 either directly or remotely, 

8 for example, by reaching the CMS 1 00 via a web connection 114. Therefore, the workstation 1 20 at 

9 which a user sits only needs a web connection 114 over a network 116, such as the Internet or other 
1 0 type of computer network, to the CMS 1 00 in order to perform tasks on the SCM cluster 1 40 . The 
,1 1 CMS 1 00 preferably also includes a centralized data repository 1 04 for the SCM cluster 1 40, a web 

1 2 server 1 1 2 that allows web access to the SCM 1 1 0 and a depot 1 06 that includes products used in the 

1 3 configuring of nodes 1 3 0 . A user interface may only run on the CMS 1 00, and no other node 1 3 0 in 

1 4 the SCM module may execute remote tasks, access the repository 1 04, or any other SCM operations, 
i 5 Although the CMS 1 00 is depicted with various components, one skilled in the art will 
i 6 appreciate that this server can contain additional or different components, hi addition, although aspects 
i 7 of an implementation consistent with the present invention are described as being stored in memory, one 
4 8 skilled in the art will appreciated that these aspects can also be stored on or read from other types of 

1 9 computer program products or computer-readable media, such as secondary storage devices, including 

20 hard disks, floppy disks, or CD-ROM; a carrier wave from the Internet or other network; or other 

2 1 forms of RAM or ROM. The computer-readable media may include instructions for controlhng the 

22 CMS 1 00 to perform a particular method. 

23 A central part of the SCM module 11 0 is the ability to execute various management commands 

24 or apphcations on the one or more nodes simultaneously. The commands or applications may need to 

25 be encapsulated with an SCM tool, which is typically used to copy files and/or execute commands on 

26 the target nodes 130. TheSCMtoolmayrunsimplecommandssuchasbdf(l)ormount(lM),launch 

27 single system interactive applications such as System Administration Manager (SAM) or Glance, launch 

28 multi-system aware applications such as Ignite/UX or Software Distributor (SD), or perform other 
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1 functions. The tool may be defined using either an SCM tool definition language through command line 

2 interface (CLI) or an SCM-provided graphical user interface (GUI). 

3 There are two general types of tools: single-system aware (SS A) tools and multi-system aware 

4 (MSA)tools. SSA tools may run on a node 130 and may only affect the operation of that node 130. 

5 To run SSA tools on multiple targetnodes 130, the SCM module 110 may execute the tools on each 

6 target node 1 3 0 . In addition to executing conunands or launching applications, SSA tools may copy 

7 files from the CMS 100 to the target nodes 130. Files may only be copied from the CMS lOOtothe 

8 managednodes 130 inthisexemplaryembodiment, not fromthe nodes 130backtotheCMS 100. 

9 MSA tools may run on a single node 1 3 0 but may be able to operate on multiple other nodes 

10 1 3 0. MS A tools are apphcations that execute on a single node but can detect and contact other nodes 

1 1 to accomplish their work and this contact is out of the control of the SCM module 1 1 0. This type of 

1 2 application may need to have a list of nodes 1 30 passed as an argument at runtime. A node 1 30 where 

1 3 the application will execute may need to be specified at tool creation time, not at runtime. The target 

1 4 nodes 1 3 0 selected by the user may be passed to an MSA tool via a target environment variable that 
'l 5 contains a target node hst for the MSA tools. MSA tools may not copy files to either the manager node 

16 1 00 orto the target nodes 130 inthisexemplaryembodiment. Therefore, an execution command string 

1 7 may be required for MSA tools. 

1 8 An SCM user may be a user that is known to the SCM module 1 1 0 and has some privileges 

1 9 and/or management roles. An SCM role, which is an expression of intent and a collection of tools for 

20 accomplishing that intent, typically defines what the user is able to do on the associated nodes 1 30 or 

2 1 node groups 1 32, e .g. , whether a user may run a tool on a node 130. Typically, in order to start the 

22 SCM module 1 1 0 or execute any SCM tools, the user may need to be added to the SCM module 1 1 0 

23 and authorized either via the GUI or the command line interface (CLI). All SCM module 110 

24 operations may be authorized based on the user's SCM authorization configuration, and/ or whether or 

25 not the user has been granted SCM trusted user privilege. 

26 The SCM user may, depending upon the roles assigned, manage systems via the SCM module 

27 110. In addition, the user may examine the SCM module log, and scan the group and role 

28 configurations. WhentheSCMuserrunsatool,theresultmaybeaaSCMtask. TheSCMmodule 
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1 110 typically assigns a task identifier for every task after it has been defined and before it is run on any 

2 targetnodes 130. This identifier may be used to track the task and to look up information later about 

3 the task in an SCM central log. 

4 An SCM trusted user is an SCM user responsible for the configuration and general 

5 administration of the SCMmodule 1 10. The trusted user istypicallyamanagerorasupervisorofa 
. 6 group of administrators whom a company trusts, or other trusted individual. Entrusted with the highest 

7 authority, the trusted user may do any authorization that is possible, including authorizing himself to 

8 execute any system management task with any of the nodes (machines) managed by the SCM module 

9 110. The capabilities of the trusted user include, for example, one or more of the folio wmg: creating 

1 0 or modifying a user' s security profile; adding, modifying or deleting a node or node group; tool 

1 1 modification; and tool authorization. The granting of these privileges implies a trust that the user is 

12 responsible for configuring and maintaining the overall structure of the SCM module 110. 

1 3 An SCM authorization model supports the notion of assigning to users the ability to run a set 

14 oftools onaset of nodes. An authorization object is an association that links auserto aroleon either 

1 5 a node or a node group. Each role may have one or more tools and each tool may belong to one or 

1 6 more roles. When users are given the authority to perform some limited set of functionality on one or 

1 7 more nodes, the authorization is done based upon roles and not on tools. The role allows the sum total 

1 8 of functionality represented by all the tools to be divided into logical sets that correspond to the 

1 9 responsibilities that would be given to the various administrators. Accordingly, there are different roles 

20 that may be configured and assigned with authorization. For example, a backup administiator with a 

2 1 "backup" role may contain tools that perform backups, manage scheduled backups, view backup 

22 status, and other backup functions. On the other hand, a database administrator with a "database" role 

23 may have a different set oftools. When a user attempts to run a tool on a node, the user may need to 

24 be checked to determine if the user is authorized to fulfill a certain role on the node and if that role 

25 contains the tool. Once auser is assigned a role, the user may be given access to any newly created 

26 tools that are later added to the role. In the example given above, the backup administrator may be 

27 assigned the "backup" role for a group of systems that run a specific application. When new backup 
2 8 tools are created and added to the "backup" role, the backup administrator may mimediately be given 
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1 access to the new tools on the systems. 

2 Figure 2 illustrates the relationships between the user 210, role 220, node 130, tool 240, and 

3 authorization 250 objects. User objects 210 represent users 210, role objects 220 represent roles 

4 220, node objects 130 represent nodes 1 30, tool objects 240 represent tools 240, and authorization 

5 objects 250 represent authorizations 250. However, forpurposes of this application, these terms are 

6 used interchangeably. Each authorization obj ect 25 0 links a single user obj ect 2 1 0 to a single role 

7 object 220 and to a single node object 1 30 (or a node group object 132). Each role object 220 may 

8 correspond to one or more tool obj ects 240, and each tool object 240 may correspond to one or more 

9 role objects 220 . Each user obj ect 2 1 0 may be assigned multiple authorizations 250, as may each role 

10 obj ect 220 and each node obj ect 1 3 0 . For example. Role 1 may contain Tools 1 -N, and User 1 may 
- 1 1 be assigned Roles 1 -M by the authorization model on Node 1 . Consequently, User 1 may run Tools 
-12 1 -N on Node 1 , based upon the role assigned. Role 1 . 

■1 3 Table 1 illustrates an example of a data structure for assigning tools 240 to different roles 220. 

i 4 Each tool 240 may correspond to a single command or application, but a single command may 

4 5 correspond to more than one tool 240 if there are other differences in how the tool 240 runs the 

1 6 command. Table 2 illustrates an example of a data structure for assigning the roles 220 to different 

1 1 users 2 1 0 on different nodes 130. 
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3 Table 2 
4 

5 Although Figure 2 shows anode authorization, asimilar structure exists for anode group 1 32 

6 authorization. The SCM authorization model may be deployed by using node group 1 32 authorizations 

7 more often than node 1 30 authorizations. This model makes adding new nodes simpler because by 

8 addinganode 130 to an existing group 132, any authorizations associated with the group 132maybe 

9 inherited at run-time by the node 130. 

1 0 The authorization model for determining if a user may execute atool 240 on a set of nodes 1 30 

1 1 may be defined by an "all or none" model. Therefore, the user 2 1 0 must have a vaUd authentication 
J2 association for each target node 130 to execute the tool 240. Ifauthorization does not exist for even 

13 one of the nodes 130, the tool execution fails. 

1 4 The S CM module 1 1 0 may also include security features to secure transactions that transmit 
- 1 5 across the network. All network transactions may be digitally signed using a public or private key pair. 
-1 6 The recipient of network transmissions may be assured of who the transmission came from and that the 

1 7 data was not altered in the transmission. A hostile party on the network may be able to view the 

1 8 transactions, but may not counterfeit or alter them. 

1 9 Referring to Figure 3 , the five separate processes involved in the tool execution may include 

20 a client process, a domain manager process, a log manager process, a DTF process and an agent 

2 1 process. Tool execution may start with a request to run a tool on one or more nodes 1 3 0 from a user 

22 210 through a client 310. The client 3 1 0 is a program that interacts with the user 2 1 0 and displays 

23 information on the computer systems that reside on the nodes 130. There are two types ofclient 3 10: 

24 graphical user interface (GUI) client may be named "scmgr", and command line interface (CLI) client 

25 for executing tasks may be named "mxexec". Examples will be provided with respect to the CLI client 

26 only. A GUI client may function in a similar fashion. 

27 The client 3 1 0 may first contact the a domain manager 3 3 0 to look up user, node, and tool 

28 information and check user authorization, then log the progress with a log manager 334. The domain 
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1 manager 3 3 0 is the "brain" of S CM module 1 1 0 and may be connected to the repository 1 04 for 

2 storage of the definitions of all the objects. The log manager 334 may manage a log file and take log 

3 requests from the clients 310 and write the requests to the SCM log file (described in detail later). 

4 Then, the client 3 1 0 may contact a DTF 340 to pass on the task to be executed. The DTF 340 may 

5 execute tasks by passing the task definitions and information to agents 370 running on the managed 

6 nodes 130. The DTF 340 is the "heart" ofall task execution activity in that all ofthe execution steps 

7 must go through the DTF 340. The DTF 340 typically obtains an authorized runnable tool from the 

8 cKents 310, distributes the tool execution across multiple nodes 130, and returns execution results to 

9 the clients 3 1 0 and to the user 210. The final process, the agent process, typically involves running the 

10 commands on the managed nodes 130. TheDTF 340 may provide taskmanager interfaces 350 that 

1 1 may be called by the clients 3 1 0 to perform a task, to cancel or kill a task, or to monitor task status, 
i 2 The DTF 340 may also provide target liaison interfaces 3 60 that may be used by the agents 370 to 

13 communicate with the DTF 340 in order to process assigned tasks. 

14 To start a task on the managed nodes 1 3 0, the DTF 340 may package up the task in a task 

1 5 description object, create target liaison objects 360 to track the target nodes 130, and pass them both 

1 6 to the agents 3 70 on the target nodes 130. The task description obj ect may include task information 

1 7 received from the user, such as the name of the tool to be run, the location of the tool, the nodes on 

18 which to run the tool, and required arguments ofthe tool, ifany. The task description obj ect may be 

1 9 serializable, so it may be shipped over the remote call in its entirety. But the target liaison 3 60 is 

20 typically a remote obj ect and so only a remote reference to it may be shipped over with the remote call. 

2 1 An important part of the task description is the task identifier described above, which may be 

22 a unique string value. It may be based upon a 32-bit integer value that will not repeat in over 60 years 

23 assuming one new task is created each second. 

24 Figure 4 is aflow chart of a method for executing tools 240 on one or more managednodes 

25 1 30 in the SCM module 1 1 0. This method may be implemented, for example, in software modules for 

26 execution by processor 108. First, the SCMmodule 1 10 may receive arequest from auser 210 to 

27 run a tool on one or more nodes 130throughtheclientprocess, step 402. The request may include 

28 task information, such as the name ofthe tool to be run, the location of the tool, the nodes on which to 
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1 run the tool, and required arguments of the tool, if any . Next, the SCM module 1 1 0 may retrieve tool 

2 definition, node definition and user definition Irom the domain manager 330, step 404, and validate the 

3 task information received from the user 210, step 406. The domain manager 330, connected to the 

4 repository 1 04, may be contacted to provide tool definition or information about the nodes 1 3 0 or the 

5 user 2 1 0 whenever the clients 3 1 0 need to look up a tool 240 or to verify nodes 1 3 0. An example of 

6 tool definition is described in United States patent application of Lister, Sanchez, Drees, and Finz, 

7 entitled "Service Control Manager Tool Definition", and filed on the same day herewith, which is 

8 incorporated herein by reference. The validation of the task information may include checking whether 

9 the nodes requested are the managed nodes, whether the tool actually exists, and whether the required 

10 arguments of the tool are given. 

1 1 Aflertherequestisvalidated,theSCMmodule llOmaycreatearunnabletoolobjectbased 

12 on the task information and the tool definition, step408. The runnable tool objectmay encapsulate the 

1 3 tool 240, the task information received fi-om the user 210, and information that may be picked up firom 

14 the environment, such as the user's name. 

1 5 Then the S CM module 1 1 0 may need to check whether the user 2 1 0 is authorized to run the 
i 6 tool 240 on all of the nodes 1 3 0 requested, i.e., whether the user 2 1 0 is assigned one or more of the 

17 roles 220 associated with the tool 240 on all of the nodes 130. For example, if auser 210 requests 

1 8 to run a tool 240 on two nodes 130, and the user 2 1 0 is only authorized to run the tool on one node 

19 1 3 0 but not the other, the SCM module 1 1 0 will not run the tool 240 on either node, due to the "all or 

20 none" authorization model. This user authorization checking may be done by a security manager 332, 

21 which may be a subsection of the domain manager 330, step 410. 

22 Once the security manager 3 3 2 has made the determination that the user 2 1 0 is authorized to 

23 run the tool 240 on all of the nodes 130 requested, the security manager 332 may return ftie information 

24 back to the client 310, and the client 3 1 0 may pass the runnable tool to the DTF 340, step 41 2. The 

25 DTF 340 may tiien issue a task identifier based on the runnable tool, step 4 1 4, and passes the runnable 

26 tool to the agents 370 associated with tiie nodes 130 to run the tool 240 using POSIX standard 

27 interfaces, step 416. POSIX is an IEEE standard, and, as examples, the HP-UX program is comphant 

28 with POSIX. The processes that can be run on a POSIX compliant system may have access to a 
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1 standard output that prints regular output, and a standard error output that prints error messages. A 

2 standard input is how a POSIX process would read input from a user or a file. The POSIX model 

3 masks input/output (I/O) operations and makes them look like file operations, reading input fi:om a file 

4 on the file system and writing output to a file. Thus standard input, standard output and standard error 
. 5 are three standardized files, and when running a command or program in a POSEX compliant operating 

6 system, a user 210 may specify and control what is attached to those three files. 

7 The task manager interface 350 may use running tool obj ects to perform the tasks, one per 

8 task. The DTP 340 may have a hash table that contains references to all the running tool obj ects that 

9 are active. The hash table is a common data structure for providing fast indexing of information by 

1 0 providing an algorithm that computes some type of address based on a hash key. The hash key for the 

1 1 hash table maybe the task identifier, a string value generated by the DTP 340 based on the runnable 

12 tool that may be guaranteed to be unique. 

1 3 When the running tool completes its task, the DTP 340 may create a completed task obj ect 

1 4 to contain the final results, and dereference the running tool because the running tool is no longer 

15 needed. The completed task objectmay be a container of status objects. The DTP 340 may have a 

1 6 hash table that contains references to all the completed task obj ects, including the status information. 

1 7 The status obj ects may include an overall task status obj ect and individual target status obj ects. 

1 8 The overall task status obj ect may include a task state indicator that reports whether the task is 

1 9 completed, failed or cancelled. The references to the runnable tool may be included so that a client that 

20 did not invoke the task may look up the definition of the task that was performed. The task state 

2 1 indicator may have one of the values as shown in Table 3 : 
22 

23 
24 



Value of task state indicator 



Meaning 



MX TASK PENDING 



The task does not have sufficient resources in the DTP y^t 
to run and so it is waiting. No targets have been 
contacted. 



MX TASK RUNNING 



The task is now running. 
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MX TASK COMPLETE 



The task is complete and it did not fail. 



MX TASK FAILED 



The task is complete and it failed before any target was 
contacted or on all targets. 



MX TASK SOME FAILURES 



The task is complete and it failed on some targets while 
failing on others. 



MX TASK CANCELLED 



The task was cancelled before it could complete on all 
specified targets. It might have failed on some targets a 
completed with no failvires on others. 
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Table 3 

The individual target status objects may report, for example, whether ornot the connection to 
the node is completed, and whether the execution of the tool on the node is successfial. The target 
status object may contain a target state indicator, a number of files copied count, a failure cause 
indicator, an exit code value, and a reference to a target output object. The target state indicator may 
take on the values as shown in Table 4: 



Value of target state indicator 



Meaning 



MX TARGET PENDING 



The target has not yet been contacted because resources are 
not available in the DTF to start it. 



MX TARGET_COPYING 



The tool has files that need to be copied to the target and 
those flies are currently being copied. 



MX TARGET RUNNING 



The command associated with the tool is now being executed 
on the target. 
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MX TARGET COMPLETE 



MX_TARGET_F AILED 



MX_TARGET_CANCELLED 



MX TARGET KILLED 



The task has completed on the target and it did not fail. Thi 3 
is the only state in which the target status object contains a 
valid exit code value and a valid reference to a target output 
object that contains the resulting output from the execution pf 
the command associated with the tool. 



The task has completed on the target and it failed. The faili re 
cause indicator contains a value that indicates the cause of tjie 
failure. 



The task was cancelled on the target. The command 
associated with the tool was never executed. 



The command associated with the tool was running and wa: 
killed before it could complete. 



m 
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39 
10 
11 
12 

13 
14 
15 
16 
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19 



If the target state indicator is MX_TARGET_COMPLETE, the target status object may 
contain a valid value for the command exit code and a valid reference to a target output obj ect, which 
may contain the exit code, standard output (stdout) and standard error output (stderr) that resulted from 
running the command associated with the tool 240 on the target node 130. The agent typically returns 
the exit code, instead of trying to interpret it, which may lead to conflicting results. 

The status objects, the target output object and the runnable tool object are all serializable for 
transport to and from the DTF 340 viaremote calls. Using remote calls to the DTF 340, the clients 
310 may access these status and output obj ects and use them to display task and target status to the 
user 210. 

After the DTF 340 passes the runnable tool to the agents 370 associated with the nodes 130, 
the agents 3 70 may execute the tool 240, step 418, and collect the target output, including the exit 
code, the stdout, and the stderr, step 420. Next, the DTF 260 may collect task results or failure 
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1 reports from the agents 370 for each node 130, step 422, and update each individual target status, step 

2 424. 

3 After all target nodes have completed the execution, the DTF 260 may update the overall task 

4 status, step 426. The target liaisons 260 typically keep track of the individual target status by 

5 communicating with the agents 370 running on each ofthe target nodes 130. When all of the running 

6 tasks reach the final stage, whether completed, failed or cancelled, the DTF 260 may return the task 

7 results or failure reports to the clients 3 1 0 and then to the user 2 1 0, step 428 . The user 2 1 0 may 

8 monitor and review the task results by displaying on a computer screen, step 432, printing on a printer, 

9 step 434, writing to a file, step 43 6, or writing to a directory of files that contains one file for each node 
JO 130 requested, step 438. 



1 1 Tool execution may involve copying files and/or running commands and programs. If there are 

12 files to be copied from the CMS 1 00 to the nodes 130, the DTF 340 typically opens the files on the 

1 3 CMS 1 00 and reads the contents before contacting any ofthe multiple target nodes 1 3 0, so that errors 

1 4 may be detected before the target nodes 1 3 0 are contacted. If the files cannot be read, the DTF 340 

1 5 may start a failure process, and return a failure status to the user 210. 

i 6 The DTF 340 may be multi-threaded in that it may accept multiple, simultaneous requests and 

1 7 may simultaneously perform multiple tasks on multiple managed nodes 130. There may be limits on the 

1 8 number of tasks that may be in process at one time and on the total number of node connections that 

19 may be active so as not to overwhelm the resources of the SCM module 110. 

20 First, there may be a limit on the maximum number of simultaneous task executions that may 

21 be enforced by the DTF 3 40, in order to limit the resource consumption on the server. For example, 

22 if the limit is ten tasks at a time, and the DTF 340 tries to run the eleventh task when there are already 

23 ten tasks running, the eleventh task will wait until one of the ten finishes. 

24 There may also be a limitation on the maximum number of nodes 130 with which the DTF 340 

25 may communicate at a time for all of the tasks. For example, if the limit is sixteen, and a task needs to 

26 be run on sixty-five different nodes 130, then only sixteen nodes 130 will be contacted bythe DTF 340, 

27 and the rest will wait until one or more of the sixteen complete the task, so that there will only be sixteen 



14 



1 nodes 130 running at a time. The purpose is again for the control of memory resources so that the 

2 CMS 100 will not be overwhelmed by a large amount of requests at the same time. 

3 Task execution is achieved through communication and interaction between the agents 3 70 and 

4 the target liaisons objects 260 on the CMS 100. The target liaison objects 3 60 may be created by the 

5 DTP 340 to keep track of the corresponding target nodes 130 and establish a one-on-one 

6 communication between the target liaisons 260 on the CMS 1 00 and the agents 3 70 running on the 

7 target nodes 1 3 0 . To create the target Haison obj ect 260, the DTP 3 40 may initialize the target liaison 

8 obj ect 260 using the passed in arguments that include the task identifier, the hostname of the target with 

9 which it communicates, the number of files to be copied, and a reference to the running tool. Next, the 

1 0 DTP 340 may contact the agents 370 running on the target nodes 130 via the RMI registries on the 

1 1 nodes 1 3 0 (described later) . The DTP 340 may pass the remote reference, the task definition, and a 

1 2 digital signature of the passed arguments to the agents 370 associated with the nodes 130. Then the 

1 3 execution of the task on the target nodes 1 3 0 is in the control of the agents 370 miming on the nodes 

14 130. 

1 5 The SCM agents 370 may be the software component that are installed on all the managed 

16 nodes 130 in an SCM cluster that performs tasks on the nodes 130onbehalfoftheDTF340. The 

1 7 agents 370 typically communicate with the DTP via Java Remote Method Invocation (RMI) calls and 

1 8 register singleton obj ects with the Java RMI registries running on the nodes . Java RMI is a distributed 

1 9 obj ect model for the Java Platform and extends the Java obj ect model beyond a single virtual machine 

20 address space, so that executable code can be dynamically distributed on demand, including all 

2 1 necessary code for distributed applications. The term "Java" is a trademark of Sun Microsystems, Inc. 

22 The execution of the task on the target nodes 1 3 0 may start with the agents 3 70 unpacking the 

23 task information and the tool definition encapsulated within the rurmable tool. The agents 3 70 may be 

24 connected with the corresponding target liaison obj ect 260 at the CMS 1 00, and therefore may report 

25 any changes, for example, a cancellation, quickly back to the DTP 340. 

26 The agents 370 running on the managed nodes 130 may need to execute tasks with the 

27 minimum amomt of invasion, i.e., use the least amount of resource, because the managed nodes may 
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1 be web servers or database servers that have other important tasks. Therefore there may be a limit on 

2 the number of simultaneous tasks that can be performed by the agents 3 70. When a remote call is 

3 made to run a tool 240 on a target node 1 3 0, the agent 3 70 may check to see if there is a tool runner 

4 object in the free list. Ifthereis,theagent370mayremovethetoolrunnerfromthefreehst,initialize 

5 it, and then, using the task identifier as the key, add it to the active runner list. Next a thread may be 

6 created and passed to the tool runner. The task has now been launched with the tool runner doing most 

7 of the work. On the other hand, if there are no free tool runners, i.e., when the task capacity of the 

8 agent 370 is reached, any subsequent attempts to start new tasks on the agent 370 may result in an 

9 exception back to the DTP 3 40 . The DTP 340 may attempt to run the task on any other pending target 

10 nodes 130 before retrying with the target node 130 that is at its limit. This may allow the task to 

1 1 continue on other nodes 130 that may be less loaded. If there are no other target nodes 1 30 on which 
J 2 to run the task, the DTP 340 may wait a small time, for example, a second, and retry starting the task 

1 3 again. This may continue until the target node 130 completes another task and accepts the new one 

14 or imtil the user 210 cancels the task. After the tool runner completes the task, the agent 370 may 

1 5 remove the tool runner from the active list and place it on the free list. 

1 6 These limitations, i.e., task limit, node limit and agent limit, may all be customized by the user 

17 210 depending upon the resources available. 

1 8 An agent status obj ect, parallel to the target status object, may be used to report the status of 

1 9 the task running on the individual nodes 130. The initial value of the agent status obj ect may be 

20 MX_AGENT_TR_PENDING. After a call is made to run a tool 240 on the node 130, the agent 370 

2 1 running on the node 1 3 0 may first check to see if the tool 240 specifies any files to be copied. If so, 

22 the tool runner may update the agent status value to MX_AGENT_TR_COP YING and then copy the 

23 files into place. Errors that result from copying files may result in a final agent status value of 

24 MX_AGENT_TR_F AILED or MX_AGENT_TR_CANCELLED and a failure may be reported. 

25 If there are no files to copy, or after all such files have been copied, the runner may check the 

26 kill request flag to see if a kill task call has occurred in another thread. If so, the runner may update the 

27 agent status value to MX_AGENT_TR_KILLED and report a failure. If not, the runner may update 



16 



1 the agent status value to MX_AGENT_TR_RUNNING and continue. The tool runner may then run 

2 the commands associated with the tool 240 in a separate process and gather up the exit code, stdout 

3 and stderr. 

4 An integral part of the SCM functionality may be the ability to record and maintain a history of 

5 events, by logging both S CM configuration changes and task execution events through the log manager 

6 334. SCM configuration changes may include adding, modifying and deleting users and nodes in the 

7 SCM module 110, and creating, modifying and deleting node groups 132 and tools 240. Task 

8 execution events may include details and intermediate events associated with the running of a tool 240. 

9 The details may include the identity of the user 210 who launched the task, the task identifier, the task 

1 0 start time, the actual tool and command line with arguments, and the list of target nodes 130. The 

1 1 intermediate events may include the beginning of a task on a managed node 130, and exceptions that 

12 occur in attempting to rxin a tool 240 on a node 130,andthefinalresult,ifany,ofthetask. Theexit 

1 3 code, stdout and stderr, if they exist, may also be logged. 

1 4 While the present invention has been described in connection with an exemplary embodiment, 

15 it will be understood that many modifications will be readily apparent to those skilled in the art, and this 

16 application is intended to cover any variations thereof. 
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